-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[red-knot] Improve the unresolved-import
check
#13007
Conversation
7883c7a
to
96dbfdd
Compare
|
96dbfdd
to
36dca2a
Compare
@@ -977,11 +996,34 @@ impl<'db> TypeInferenceBuilder<'db> { | |||
// the runtime error will occur immediately (rather than when the symbol is *used*, | |||
// as would be the case for a symbol with type `Unbound`), so it's appropriate to | |||
// think of the type of the imported symbol as `Unknown` rather than `Unbound` | |||
let ty = module_ty | |||
let member_ty = module_ty | |||
.unwrap_or(Type::Unbound) | |||
.member(self.db, &Name::new(&name.id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still find it odd that we have to do the "unbound" check everywhere where we call .member
. Do we not always want to emit a diagnostic when failing to resolve a member?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do, but it will need to be different error codes. It would be very counterintuitive for users if import foo
triggers an unresolved-import
error code and from foo import bar
triggers an unresolved-member
error code. Users will expect all import-related failures to be the same error code and all attribute-access-related failures to be a different error code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. We might just want to parameterize the member
method. But maybe that's not worth it? I don't know. Have you looked at how mypy/pyright do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might just want to parameterize the
member
method.
Not sure exactly what you mean; could you clarify?
Have you looked at how mypy/pyright do this?
Mypy has a very different architecture to the one we're building. It resolves all imports eagerly in a first pass before it does any other type checking; module-not-found
errors are emitted during this first pass here: https://github.com/python/mypy/blob/fe15ee69b9225f808f8ed735671b73c31ae1bed8/mypy/build.py#L2600-L2700.
I'm not as familiar with pyright's codebase but I can dig in now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... it seems both mypy and pyright in fact do what you'd prefer here -- they both use "attribute-access" error codes for from x import y
imports:
- pyright: https://pyright-play.net/?pyrightVersion=1.1.368&code=GYJw9gtgBAxmA28CmMAuBLMA7AzgOgEMAjGKdCABzBFSmDDCA
- mypy: https://mypy-play.net/?mypy=latest&python=3.12&gist=3110174b575f8d8eba319462c45a8ddf
Pyright's error code is reportAttributeAccessIssue
; mypy's is attr-defined
, which both seem equally bad. In terms of error messages, pyright definitely wins, though: pyright has:
"foo" is unknown import symbol
whereas mypy has
Module "collections.abc" has no attribute "foo"
I'm somewhat surprised by this. But given this precedent, I'm okay with emitting the diagnostic from .member()
, if you'd prefer. Though I hope we still aspire to be more user-friendly in our error messages than either mypy or pyright.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I agree that that makes a lot of sense! I can make that change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... if the inferred type of a member access is a union, e.g. int | Unbound
, would you expect that to be represented as an Ok()
or an Err()
variant (if we return an Result
from .member()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having a closure or similar to create a diagnostic if the member is unbound.
This might be doable, but after looking at it for a little bit I think it would still be quite a large refactor and design change. I think it's out of the scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced there is any problem with reporting an error on from foo import bar
(where the module foo
does exist, but has no top-level symbol bar
) as an attribute-access error code. If I had to pick between the pyright or mypy error messages above, I would prefer the mypy one (the pyright one gives less useful context). The fact that the attribute access occurred as part of a from-import will be clear from the error location; it doesn't have to be part of the error message or code. I wouldn't want to make design decisions based on the hypothesis that this is confusing to users without clear evidence that it is (e.g. a history of users filing issues on mypy about that error code or message); personally I find it intuitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should return an Option
or Result
from member()
, for the reason @AlexWaygood mentions above: we can have a possibly-unbound name (currently represented as a union with Unbound), and it's not clear how that should look in a binary representation.
We could design our own representation for "Type plus definitely-bound/maybe-unbound/definitely-unbound state" and use that instead of Type::Unbound
. This will complicate some APIs and I think be somewhat less efficient, but it would have the advantage that it would force handling Unbound early, which I think is desirable. In general I don't think we want Unbound escaping from the scope where it originated; it should instead result in a diagnostic and be eliminated from the type (replaced with Unknown
if we have no other type information; probably just disappear from the type if we do.) I do think this alternative to Type::Unbound
is worth exploring and seeing what kind of impact it will have.
I was thinking that we would push diagnostics inside member()
to simplify the handling of Unbound, and I think I still favor that approach. If we do want to make distinctions in error codes/messages based on context from the caller of member()
, I think we can always pass in context to make that possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still have preferred to implement a holistic solution to how we handle unresolved members rather than implementing a one off solution because I'm worried that we (I for example) end up copying this pattern. That was also the reason why I intentionally didn't implement this diagnostic in my original PR.
I'm okay with merging if you find the parity important but I ask you to at least add a TODO comment to make it clear that this pattern should not be copied.
Well, so would I! #12986 was my attempt 😆 The idea behind this PR was to try to implement as many of the fixes implemented there as possible without fundamentally reworking red-knot's design, so that it would be clearer exactly what the pros and cons of doing so would be.
I'll add this |
Update crates/red_knot_python_semantic/src/types/infer.rs Co-authored-by: Micha Reiser <micha@reiser.io>
7e907f8
to
ea3a449
Compare
CodSpeed Performance ReportMerging #13007 will improve performances by 4.16%Comparing Summary
Benchmarks breakdown
|
ea3a449
to
fce1b32
Compare
fce1b32
to
2c1fb1d
Compare
let b_file_diagnostics = super::check_types(&db, b_file); | ||
assert_diagnostic_messages( | ||
&b_file_diagnostics, | ||
&["Could not resolve import of 'thing' from 'a'"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I think describing it as a failed attribute access would actually be much clearer than the vague "Could not resolve" which doesn't really communicate anything! This message also doesn't currently clarify that 'a'
is a module (though it could be taken as implied.) I actually can't think of an error message here that I think would be better than "Module a
has no attribute thing
."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do people really think of from collections import deque
as an "attribute access" of deque
from the collections
module? I certainly don't. Of course that's what's happening under the hood, but even at runtime this detail is hidden -- the import fails with ImportError
rather than AttributeError
, and there's no mention of attempting to access an attribute:
% python -c "from collections import foo"
Traceback (most recent call last):
File "<string>", line 1, in <module>
ImportError: cannot import name 'foo' from 'collections' (/Users/alexw/.pyenv/versions/3.12.4/lib/python3.12/collections/__init__.py)
A spurious second 'Unresolved import' diagnostic message is emitted on `b.py`, \ | ||
despite the symbol existing in the symbol table for `a.py`"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think when we issue the error on the import in a.py
, we should set the type of foo
in a
to Unknown
, not Unbound
, which should avoid this?
&self, | ||
tail: Option<&str>, | ||
level: NonZeroU32, | ||
) -> Result<ModuleName, ModuleResolutionError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just emit the diagnostics inside this function, return an Option
, and avoid the need for ModuleResolutionError
?
module.unwrap_or_default() | ||
), | ||
); | ||
} else if module_ty.is_ok() && member_ty.is_unknown() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be emitting a diagnostic at all if the imported symbol is of type Unknown
. Unknown
might result from an error in the other module, but in that case the diagnostic should occur in the other module; importing a value of unknown type is not an error.
I think just removing this clause would fix the ignored test; what would it break?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the only reason we have to check for Unknown here is that we intentionally replaced Unbound with Unknown just a few lines above. If we wait to do it after, the bug in the ignored test goes away, and all tests pass:
diff --git a/crates/red_knot_python_semantic/src/types.rs b/crates/red_knot_python_semantic/src/types.rs
index 173c957d1..c4ff40220 100644
--- a/crates/red_knot_python_semantic/src/types.rs
+++ b/crates/red_knot_python_semantic/src/types.rs
@@ -451,9 +451,6 @@ mod tests {
);
}
- #[ignore = "\
-A spurious second 'Unresolved import' diagnostic message is emitted on `b.py`, \
-despite the symbol existing in the symbol table for `a.py`"]
#[test]
fn resolved_import_of_symbol_from_unresolved_import() {
let mut db = setup_db();
diff --git a/crates/red_knot_python_semantic/src/types/infer.rs b/crates/red_knot_python_semantic/src/types/infer.rs
index 9b183727e..28dc8f2e3 100644
--- a/crates/red_knot_python_semantic/src/types/infer.rs
+++ b/crates/red_knot_python_semantic/src/types/infer.rs
@@ -1012,15 +1012,9 @@ impl<'db> TypeInferenceBuilder<'db> {
asname: _,
} = alias;
- // If a symbol is unbound in the module the symbol was originally defined in,
- // when we're trying to import the symbol from that module into "our" module,
- // the runtime error will occur immediately (rather than when the symbol is *used*,
- // as would be the case for a symbol with type `Unbound`), so it's appropriate to
- // think of the type of the imported symbol as `Unknown` rather than `Unbound`
let member_ty = module_ty
.unwrap_or(Type::Unbound)
- .member(self.db, &Name::new(&name.id))
- .replace_unbound_with(self.db, Type::Unknown);
+ .member(self.db, &Name::new(&name.id));
if matches!(module_ty, Err(ModuleResolutionError::UnresolvedModule)) {
self.add_diagnostic(
@@ -1032,7 +1026,7 @@ impl<'db> TypeInferenceBuilder<'db> {
module.unwrap_or_default()
),
);
- } else if module_ty.is_ok() && member_ty.is_unknown() {
+ } else if module_ty.is_ok() && member_ty.is_unbound() {
self.add_diagnostic(
AnyNodeRef::Alias(alias),
"unresolved-import",
@@ -1044,6 +1038,8 @@ impl<'db> TypeInferenceBuilder<'db> {
);
}
+ let member_ty = member_ty.replace_unbound_with(self.db, Type::Unknown);
+
self.types.definitions.insert(definition, member_ty);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think you're right that this fixes the tests I added (and skipped) in this PR. Here's an interesting edge case that you might not have considered, though. What should we do for something like this?
# foo.py
for x in range(0):
pass
# bar.py
from foo import x
Running python bar.py
fails with ImportError
at runtime because x
is not defined in foo.py
(due to the iterable being empty), but neither mypy nor pyright catches this. Arguably we should infer the type of x
after the for
loop has completed as being int | Unbound
-- should we flag any union that includes Unbound
as one of the elements with an unresolved-import
diagnostic? Probably not, I don't think -- the import might succeed, it just also might not. So let's say (for argument's sake) we use the error code import-possibly-unbound
for that.
So then what if we have something like this, where the symbol definitely exists in foo.py
, but is also definitely unbound? (Mypy and pyright do both catch this one.) Should that have the error code unresolved-import
, or import-possibly-unbound
?
# foo.py
x: int
# bar.py
from foo import x
These are somewhat obscure edge cases, so I think what you propose here is a strict improvement on what I implemented in this PR; I'll make a fixup PR to address your suggestions. But my point is that we may still at some point need some context propagated regarding the cause of the error -- whether that error is represented as Unknown
or Unbound
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, great points and great questions! I had the x: int
case in mind last night when thinking again about representation of unbound-ness in a different comment on this PR.
I agree that maybe-unbound is a different error class than definitely-unbound (and arguable whether maybe-unbound should trigger an error on import at all.)
I think we do want a representation of "definitely unbound and typed" that we use for imports, so e.g. we can error if you import from x: int
(as long as it's not a stub), but we can still type check the importing module treating x
as int
rather than Unknown
. (Not sure how important this is, but it seems marginally better.) I think this can be handled by declared vs inferred types.
What's less clear to me is if we need a representation of "definitely unbound and typed" for inferred types. It would mean that within a scope we could also check code following x: int
and emit diagnostics both for "x is not bound" and for use of x that isn't valid for an int. Maybe that's useful?
Like I mentioned above in a different comment, I'm pretty open to exploring other representations of unbound to make it orthogonal to types; the main question for me is if we can avoid it regressing in perf, and if not, do we gain enough from it to be worth the regression?
} | ||
&self, | ||
module_name: ModuleName, | ||
) -> Result<Type<'db>, ModuleResolutionError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the advantage of returning a Result
vs just emitting the diagnostic directly in this method, as the previous code did.
While I'm making notes on this PR of things that I'd like to address in this area: we've now started to accumulate some tests in |
I think the idea is that the tests in |
My initial idea was to only test that |
Ah! Yes, I see that distinction does seem to apply currently. I think longer-term it doesn't necessarily make sense to split tests according to this distinction, though. At some point I suspect ~all of our tests will be tests of diagnostics, which can include a diagnostic from |
Summary
This PR improves the
unresolved-import
check so that it once again is able to recognise unresolvedimport from
imports. It adds the same tests as #12986, but doesn't add theUnknownTypeKind
enum that that PR introduces.I've had to comment out some assertions in the tests for now. One case that passes with #12986 but doesn't work with this PR is this:
src/a.py
:import foo as foo
src/b.py
:from a import foo
This should only cause an
unresolved-import
diagnostic to be emitted ona.py
, and this is the case with #12986, but with this PRunresolved-import
diagnostics are emitted on both files.Another case that passes with #12986 but not with this PR is the case where a symbol that exists, but has an inferred type of
Unknown
, is imported from another module. This should not trigger anunresolved-import
diagnostic, but does with this PR, as it's hard to determine the root cause of theUnknown
type.Overall, this PR feels to me like it's an improvement on the status quo, but inferior to #12986. It's also a less significant change than #12986, however, since it's not a significant change to the design of how we think about types in red-knot, so the idea is that it should be possible to land this and then rebase #12986 on top of it.
Test Plan
Several tests added to
crates/red_knot_python_semantic/src/types.rs
andcrates/red_knot_python_semantic/src/types/infer.rs
. The assertions in the benchmarks have also been updated.